{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"align-center\">\n",
    "<a href=\"https://oumi.ai/\"><img src=\"https://oumi.ai/docs/en/latest/_static/logo/header_logo.png\" height=\"200\"></a>\n",
    "\n",
    "[![Documentation](https://img.shields.io/badge/Documentation-latest-blue.svg)](https://oumi.ai/docs/en/latest/index.html)\n",
    "[![Discord](https://img.shields.io/discord/1286348126797430814?label=Discord)](https://discord.gg/oumi)\n",
    "[![GitHub Repo stars](https://img.shields.io/github/stars/oumi-ai/oumi)](https://github.com/oumi-ai/oumi)\n",
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/oumi-ai/oumi/blob/main/notebooks/Oumi - Using NanoGPT.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "</div>\n",
    "\n",
    "👋 Welcome to Open Universal Machine Intelligence (Oumi)!\n",
    "\n",
    "🚀 Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from [data preparation](https://oumi.ai/docs/en/latest/resources/datasets/datasets.html) and [training](https://oumi.ai/docs/en/latest/user_guides/train/train.html) to [evaluation](https://oumi.ai/docs/en/latest/user_guides/evaluate/evaluate.html) and [deployment](https://oumi.ai/docs/en/latest/user_guides/launch/launch.html). Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.\n",
    "\n",
    "🤝 Make sure to join our [Discord community](https://discord.gg/oumi) to get help, share your experiences, and contribute to the project! If you are interested in joining one of the community's open-science efforts, check out our [open collaboration](https://oumi.ai/community) page.\n",
    "\n",
    "⭐ If you like Oumi and you would like to support it, please give it a star on [GitHub](https://github.com/oumi-ai/oumi)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Adapting NanoGPT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "❗**NOTICE:** We recommend running this notebook on a GPU. If running on Google Colab, you can use the free T4 GPU runtime (Colab Menu: `Runtime` -> `Change runtime type`).\n",
    "\n",
    "The goal of this notebook is to show how to use a custom model with Oumi.\n",
    "\n",
    "In this case, we will adapt [nanogpt](https://github.com/karpathy/nanoGPT), and train it with the HuggingFace training loop."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "### Oumi Installation\n",
    "\n",
    "First, let's install Oumi and tiktoken. You can find more detailed instructions [here](https://oumi.ai/docs/en/latest/get_started/installation.html).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install oumi tiktoken"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Clone nanoGPT"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, clone the nanoGPT repository and add it to our path:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "from pathlib import Path\n",
    "\n",
    "module_folder = Path(\"/tmp/oumi/nanoGPT\")\n",
    "\n",
    "# Clone the nanoGPT repo\n",
    "if not module_folder.is_dir():\n",
    "    module_folder.mkdir(parents=True, exist_ok=True)\n",
    "    !git clone https://github.com/karpathy/nanoGPT {module_folder}\n",
    "else:\n",
    "    print(\"nanoGPT already cloned!\")\n",
    "\n",
    "sys.path.append(str(module_folder))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adapting nanoGPT model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.nn.functional as F\n",
    "from model import GPT, GPTConfig  # import from ~/nanoGPT/model.py\n",
    "\n",
    "from oumi.core import registry\n",
    "\n",
    "\n",
    "@registry.register(\"oumi-nanoGPT\", registry_type=registry.RegistryType.MODEL)\n",
    "class OumiNanoGPT(GPT):\n",
    "    def __init__(self, **kwargs):\n",
    "        \"\"\"Initializes an instance of the class.\"\"\"\n",
    "        gpt_config = GPTConfig()\n",
    "        gpt_config.bias = False\n",
    "\n",
    "        super().__init__(gpt_config)\n",
    "\n",
    "    def forward(self, input_ids, labels=None, attention_mask=None):\n",
    "        \"\"\"Performs the forward pass of the model.\"\"\"\n",
    "        # Update the return format to be compatible with our Trainer.\n",
    "        logits, loss = super().forward(idx=input_ids, targets=labels)\n",
    "        outputs = {\"logits\": logits}\n",
    "        if loss:\n",
    "            outputs[\"loss\"] = loss\n",
    "        return outputs\n",
    "\n",
    "    def criterion(self):\n",
    "        \"\"\"Returns the criterion used for calculating the loss.\"\"\"\n",
    "        return F.cross_entropy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training\n",
    "\n",
    "Ok now we are ready to train our model! we can start from the default gpt2 config, and edit as needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import oumi\n",
    "from oumi.core.configs import TrainingConfig"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile /tmp/oumi/nanoGPT/gpt_train.yaml\n",
    "\n",
    "model:\n",
    "  model_name: \"gpt2\" # 124M params\n",
    "  model_max_length: 128\n",
    "  torch_dtype_str: \"bfloat16\"\n",
    "  tokenizer_pad_token: \"<|endoftext|>\"\n",
    "  load_pretrained_weights: False\n",
    "  trust_remote_code: True\n",
    "  model_kwargs:\n",
    "    disable_dropout: True\n",
    "\n",
    "data:\n",
    "  train:\n",
    "    datasets:\n",
    "      - dataset_name: \"HuggingFaceFW/fineweb-edu\"\n",
    "        subset: \"sample-10BT\"\n",
    "        split: \"train\"\n",
    "        dataset_kwargs:\n",
    "          seq_length: 128\n",
    "    stream: True\n",
    "    pack: True\n",
    "    target_col: \"text\"\n",
    "\n",
    "training:\n",
    "  trainer_type: \"TRL_SFT\"\n",
    "  per_device_train_batch_size: 2\n",
    "  max_steps: 10\n",
    "\n",
    "  # If enabled, reduces memory consumption by ~3x but causes a 30% training slowdown.\n",
    "  enable_gradient_checkpointing: False\n",
    "  gradient_checkpointing_kwargs:\n",
    "    use_reentrant: False\n",
    "\n",
    "  # https://github.com/karpathy/build-nanogpt/blob/master/train_gpt2.py#L349\n",
    "  learning_rate: 6.0e-04\n",
    "  lr_scheduler_type: \"cosine_with_min_lr\"\n",
    "  lr_scheduler_kwargs:\n",
    "    min_lr_rate: 0.1\n",
    "  warmup_steps: 715\n",
    "  adam_beta1: 0.9\n",
    "  adam_beta2: 0.95\n",
    "  weight_decay: 0.1\n",
    "\n",
    "  run_name: \"gpt2_pt\"\n",
    "  output_dir: \"output/gpt2.pt\"\n",
    "  include_performance_metrics: True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Starting from the default GPT-2 config\n",
    "config_path = \"/tmp/oumi/nanoGPT/gpt_train.yaml\"\n",
    "config = TrainingConfig.from_yaml(config_path)\n",
    "\n",
    "config.training.output_dir = \"nanogpt_tutorial\"\n",
    "# Update to use our newly registered nanoGPT model\n",
    "config.model.model_name = \"oumi-nanoGPT\"  # needs to match the registered model name\n",
    "# We do not have a custom tokenizer, but we can use the GPT-2 tokenizer from HuggingFace\n",
    "config.model.tokenizer_name = \"gpt2\"\n",
    "# These are needed specifically to get nanoGPT to work, and likely aren't needed for\n",
    "# other custom models.\n",
    "config.training.enable_tensorboard = False\n",
    "config.training.save_steps = 0\n",
    "config.training.save_final_model = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oumi.train(config)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "oumi",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
